Fix MATCH after CREATE returning 0 rows (issue #2308)#2340
Conversation
When a MATCH clause follows CREATE + WITH and re-uses bound variables (e.g. CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b)), the MATCH generates filter quals (age_start_id(e) = age_id(a), etc.) that reference only columns from the predecessor subquery. PostgreSQL's optimizer pushes these quals through the transparent subquery layers into the CREATE's child plan, where they evaluate on NULL values before CREATE has executed — always yielding 0 rows. Fix: mark the predecessor subquery RTE as security_barrier when the clause chain contains a data-modifying operation (CREATE, SET, DELETE, or MERGE). This prevents PostgreSQL from pushing filter quals into the subquery, ensuring they evaluate after the DML produces output values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a planner/optimizer interaction in Apache AGE where a MATCH following CREATE ... WITH ... (reusing bound variables) could have its generated filter quals pushed below the DML plan, preventing the DML from executing and causing the query to return 0 rows.
Changes:
- Mark the predecessor subquery RTE as a PostgreSQL
security_barrierwhen the predecessor clause chain includes DML (CREATE/SET/DELETE/MERGE), preventing qual pushdown into the DML’s child plan. - Add a helper (
clause_chain_has_dml()) to detect DML operations in the clause chain. - Add regression coverage for issue #2308 and corresponding expected output.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/backend/parser/cypher_clause.c | Adds DML detection and sets security_barrier on the predecessor subquery RTE to prevent incorrect qual pushdown past DML. |
| regress/sql/cypher_match.sql | Adds regression queries covering CREATE+WITH+MATCH (and related variations) for issue #2308. |
| regress/expected/cypher_match.out | Captures expected outputs for the new regression cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@gregfelice Please see Copilot's comment above. Thoughts? |
|
Hi jrgemignani, Copilot raises whether This isn't a concern in practice — Cypher subquery expressions ( Additionally, subqueries are transformed into standalone If a future grammar extension ever allows DML inside subqueries, the helper would need updating, but that would be a much larger change with its own design considerations. Thanks, Greg |
…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix MATCH on brand-new label after CREATE returning 0 rows (issue #2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR #2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address review feedback: fix variable registration for deferred label check When the deferred label validity check (DML predecessor + non-existent label) found an invalid label, the code skipped transform_match_pattern() entirely, which meant MATCH-introduced variables were never registered in the namespace. This would cause errors if a later clause referenced those variables (e.g., RETURN p). Fix: mirror the early-check strategy by injecting a paradoxical WHERE (true = false) and always calling transform_match_pattern(). Variables get registered normally; zero rows are returned via the impossible qual. Also add ORDER BY to multi-row regression tests for deterministic output, and add a test case for DML predecessor + non-existent label + returning a MATCH-introduced variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address Copilot review: DRY false-where helper, cache has_dml, ORDER BY in tests - Factor duplicated WHERE true=false construction into make_false_where_clause() helper (used in both early and deferred label validation paths) - Compute clause_chain_has_dml() once and reuse, avoiding repeated clause chain traversal - Add ORDER BY to the single-CREATE City regression test for deterministic result ordering * Address Copilot review: volatile false predicate, DML side-effect test 1. Prevent plan elimination of DML predecessor: replace constant (true = false) with volatile (random() IS NULL) in the deferred label check path. PG's planner can constant-fold the former into a One-Time Filter: false, skipping the DML scan entirely. 2. Unify make_false_where_clause(bool volatile_needed): merge the constant and volatile variants into a single parameterized function. Call sites are now self-documenting: - make_false_where_clause(false) for non-DML path - make_false_where_clause(true) for DML predecessor path 3. Document why add_volatile_wrapper() cannot be reused here (it operates post-transform at the Expr level and returns agtype, while the WHERE clause is built at the parse-tree level). 4. Add regression test verifying CREATE side effects persist when MATCH references a non-existent label after a DML predecessor. All regression tests pass (cypher_match: ok). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Replace non-ASCII em dashes with -- in C comments ASCII-only codebase convention; avoids encoding/tooling issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Problem
CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b) SET a.something = 'something' RETURN areturns 0 rows instead of the created vertex.Reported in #2308.
Root Cause
When MATCH follows CREATE + WITH and re-uses all bound variables, the MATCH generates filter quals (
age_start_id(e) = age_id(a) AND age_end_id(e) = age_id(b)) that reference only columns from the predecessor subquery. Since the MATCH adds no new table scans (all entities are bound), PostgreSQL's optimizer treats the subquery as transparent and pushes these filter quals down through the subquery layers into the CREATE's child plan.Before fix — EXPLAIN plan showing the bug:
The filter is placed below the CREATE custom scan, on its child subquery. At execution time:
Resultproduces a dummy row with NULL valuesage_start_id(NULL) = age_id(NULL)→ failsThe filter should evaluate after CREATE produces its output (where
a,e,bhave actual values), not before.Fix
In
transform_cypher_match_pattern(), after transforming the predecessor clause chain into a subquery RTE, check if the chain contains any data-modifying operation (CREATE, SET, DELETE, or MERGE). If it does, setrte->security_barrier = trueon the subquery RTE. This is PostgreSQL's standard mechanism to prevent qual pushdown through subqueries — the optimizer will not flatten a security-barrier subquery or push filter conditions into it.A helper function
clause_chain_has_dml()walks the clause chain to detect DML operations.After fix — EXPLAIN plan showing correct structure:
Now the filter is placed above the CREATE custom scan. CREATE runs first (inserting entities), then the filter evaluates on the output values and correctly passes.
Files Changed
src/backend/parser/cypher_clause.c— Addedclause_chain_has_dml()helper function andsecurity_barrierlogic intransform_cypher_match_pattern()regress/sql/cypher_match.sql— Regression tests for issue MATCH after CREATE does not return the newly created row #2308regress/expected/cypher_match.out— Expected test outputRegression Tests
Added tests covering:
CREATE + WITH + MATCH + SET + RETURN(1 row expected)All 31 existing regression tests pass with this change.
Closes #2308.
AI Disclosure
AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.